{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "79f24a6b",
   "metadata": {},
   "source": [
    "# File Directory\n",
    "\n",
    "This covers how to use the `DirectoryLoader` to load all documents in a directory. Under the hood, by default this uses the [UnstructuredLoader](./unstructured_file.ipynb)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "019d8520",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.document_loaders import DirectoryLoader"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0c76cdc5",
   "metadata": {},
   "source": [
    "We can use the `glob` parameter to control which files to load. Note that here it doesn't load the `.rst` file or the `.ipynb` files."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "891fe56f",
   "metadata": {},
   "outputs": [],
   "source": [
    "loader = DirectoryLoader('../', glob=\"**/*.md\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "addfe9cf",
   "metadata": {},
   "outputs": [],
   "source": [
    "docs = loader.load()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "b042086d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(docs)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e633d62f",
   "metadata": {},
   "source": [
    "## Show a progress bar"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "43911860",
   "metadata": {},
   "source": [
    "By default a progress bar will not be shown. To show a progress bar, install the `tqdm` library (e.g. `pip install tqdm`), and set the `show_progress` parameter to `True`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "bb93daac",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: tqdm in /Users/jon/.pyenv/versions/3.9.16/envs/microbiome-app/lib/python3.9/site-packages (4.65.0)\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "0it [00:00, ?it/s]\n"
     ]
    }
   ],
   "source": [
    "%pip install tqdm\n",
    "loader = DirectoryLoader('../', glob=\"**/*.md\", show_progress=True)\n",
    "docs = loader.load()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c16ed46a",
   "metadata": {},
   "source": [
    "## Use multithreading"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "5752e23e",
   "metadata": {},
   "source": [
    "By default the loading happens in one thread. In order to utilize several threads set the `use_multithreading` flag to true."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f8d84f52",
   "metadata": {},
   "outputs": [],
   "source": [
    "loader = DirectoryLoader('../', glob=\"**/*.md\", use_multithreading=True)\n",
    "docs = loader.load()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c5652850",
   "metadata": {},
   "source": [
    "## Change loader class\n",
    "By default this uses the `UnstructuredLoader` class. However, you can change up the type of loader pretty easily."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "81c92da3",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.document_loaders import TextLoader"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "ab38ee36",
   "metadata": {},
   "outputs": [],
   "source": [
    "loader = DirectoryLoader('../', glob=\"**/*.md\", loader_cls=TextLoader)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "25c8740f",
   "metadata": {},
   "outputs": [],
   "source": [
    "docs = loader.load()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "38337763",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(docs)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "598a2805",
   "metadata": {},
   "source": [
    "If you need to load Python source code files, use the `PythonLoader`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "c558bd73",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.document_loaders import PythonLoader"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "a3cfaba7",
   "metadata": {},
   "outputs": [],
   "source": [
    "loader = DirectoryLoader('../../../../../', glob=\"**/*.py\", loader_cls=PythonLoader)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "e2e1e26a",
   "metadata": {},
   "outputs": [],
   "source": [
    "docs = loader.load()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "ffb8ff36",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "691"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(docs)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6411a0cb",
   "metadata": {},
   "source": [
    "## Auto detect file encodings with TextLoader\n",
    "\n",
    "In this example we will see some strategies that can be useful when loading a big list of arbitrary files from a directory using the `TextLoader` class.\n",
    "\n",
    "First to illustrate the problem, let's try to load multiple text with arbitrary encodings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "2c787a69",
   "metadata": {},
   "outputs": [],
   "source": [
    "path = '../../../../../tests/integration_tests/examples'\n",
    "loader = DirectoryLoader(path, glob=\"**/*.txt\", loader_cls=TextLoader)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e9001e12",
   "metadata": {},
   "source": [
    "### A. Default Behavior"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "b1e88c31",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #800000; text-decoration-color: #800000\">╭─────────────────────────────── </span><span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">Traceback </span><span style=\"color: #bf7f7f; text-decoration-color: #bf7f7f; font-weight: bold\">(most recent call last)</span><span style=\"color: #800000; text-decoration-color: #800000\"> ────────────────────────────────╮</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #bfbf7f; text-decoration-color: #bfbf7f\">/data/source/langchain/langchain/document_loaders/</span><span style=\"color: #808000; text-decoration-color: #808000; font-weight: bold\">text.py</span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">29</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\">load</span>                             <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">26 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span>text = <span style=\"color: #808000; text-decoration-color: #808000\">\"\"</span>                                                                           <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">27 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">with</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">open</span>(<span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.file_path, encoding=<span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.encoding) <span style=\"color: #0000ff; text-decoration-color: #0000ff\">as</span> f:                             <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">28 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">try</span>:                                                                            <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">❱ </span>29 <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   </span>text = f.read()                                                             <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">30 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">except</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">UnicodeDecodeError</span> <span style=\"color: #0000ff; text-decoration-color: #0000ff\">as</span> e:                                                 <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">31 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">if</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.autodetect_encoding:                                                <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">32 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   │   </span>detected_encodings = <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.detect_file_encodings()                       <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #bfbf7f; text-decoration-color: #bfbf7f\">/home/spike/.pyenv/versions/3.9.11/lib/python3.9/</span><span style=\"color: #808000; text-decoration-color: #808000; font-weight: bold\">codecs.py</span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">322</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\">decode</span>                         <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 319 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">def</span> <span style=\"color: #00ff00; text-decoration-color: #00ff00\">decode</span>(<span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>, <span style=\"color: #00ffff; text-decoration-color: #00ffff\">input</span>, final=<span style=\"color: #0000ff; text-decoration-color: #0000ff\">False</span>):                                                 <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 320 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"># decode input (taking the buffer into account)</span>                                   <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 321 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span>data = <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.buffer + <span style=\"color: #00ffff; text-decoration-color: #00ffff\">input</span>                                                        <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">❱ </span> 322 <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span>(result, consumed) = <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>._buffer_decode(data, <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.errors, final)                <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 323 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"># keep undecoded input until the next call</span>                                        <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 324 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.buffer = data[consumed:]                                                     <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\"> 325 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">return</span> result                                                                     <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">╰──────────────────────────────────────────────────────────────────────────────────────────────────╯</span>\n",
       "<span style=\"color: #ff0000; text-decoration-color: #ff0000; font-weight: bold\">UnicodeDecodeError: </span><span style=\"color: #008000; text-decoration-color: #008000\">'utf-8'</span> codec can't decode byte <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0xca</span> in position <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">0</span>: invalid continuation byte\n",
       "\n",
       "<span style=\"font-style: italic\">The above exception was the direct cause of the following exception:</span>\n",
       "\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">╭─────────────────────────────── </span><span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">Traceback </span><span style=\"color: #bf7f7f; text-decoration-color: #bf7f7f; font-weight: bold\">(most recent call last)</span><span style=\"color: #800000; text-decoration-color: #800000\"> ────────────────────────────────╮</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\">&lt;module&gt;</span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">1</span>                                                                                    <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">❱ </span>1 loader.load()                                                                                <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">2 </span>                                                                                             <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #bfbf7f; text-decoration-color: #bfbf7f\">/data/source/langchain/langchain/document_loaders/</span><span style=\"color: #808000; text-decoration-color: #808000; font-weight: bold\">directory.py</span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">84</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\">load</span>                        <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">81 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">if</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.silent_errors:                                              <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">82 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   │   │   │   </span>logger.warning(e)                                               <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">83 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">else</span>:                                                               <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">❱ </span>84 <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   │   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">raise</span> e                                                         <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">85 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">finally</span>:                                                                <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">86 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">if</span> pbar:                                                            <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">87 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   │   │   │   </span>pbar.update(<span style=\"color: #0000ff; text-decoration-color: #0000ff\">1</span>)                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #bfbf7f; text-decoration-color: #bfbf7f\">/data/source/langchain/langchain/document_loaders/</span><span style=\"color: #808000; text-decoration-color: #808000; font-weight: bold\">directory.py</span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">78</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\">load</span>                        <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">75 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">if</span> i.is_file():                                                                 <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">76 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">if</span> _is_visible(i.relative_to(p)) <span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">or</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.load_hidden:                       <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">77 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">try</span>:                                                                    <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">❱ </span>78 <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   │   │   </span>sub_docs = <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.loader_cls(<span style=\"color: #00ffff; text-decoration-color: #00ffff\">str</span>(i), **<span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.loader_kwargs).load()     <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">79 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   │   │   </span>docs.extend(sub_docs)                                               <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">80 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">except</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">Exception</span> <span style=\"color: #0000ff; text-decoration-color: #0000ff\">as</span> e:                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">81 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">if</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.silent_errors:                                              <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #bfbf7f; text-decoration-color: #bfbf7f\">/data/source/langchain/langchain/document_loaders/</span><span style=\"color: #808000; text-decoration-color: #808000; font-weight: bold\">text.py</span>:<span style=\"color: #0000ff; text-decoration-color: #0000ff\">44</span> in <span style=\"color: #00ff00; text-decoration-color: #00ff00\">load</span>                             <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>                                                                                                  <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">41 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">except</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">UnicodeDecodeError</span>:                                          <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">42 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   │   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">continue</span>                                                        <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">43 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">else</span>:                                                                       <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span> <span style=\"color: #800000; text-decoration-color: #800000\">❱ </span>44 <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">raise</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">RuntimeError</span>(<span style=\"color: #808000; text-decoration-color: #808000\">f\"Error loading {</span><span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.file_path<span style=\"color: #808000; text-decoration-color: #808000\">}\"</span>) <span style=\"color: #0000ff; text-decoration-color: #0000ff\">from</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff; text-decoration: underline\">e</span>            <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">45 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">except</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">Exception</span> <span style=\"color: #0000ff; text-decoration-color: #0000ff\">as</span> e:                                                          <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">46 </span><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">│   │   │   │   </span><span style=\"color: #0000ff; text-decoration-color: #0000ff\">raise</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff\">RuntimeError</span>(<span style=\"color: #808000; text-decoration-color: #808000\">f\"Error loading {</span><span style=\"color: #00ffff; text-decoration-color: #00ffff\">self</span>.file_path<span style=\"color: #808000; text-decoration-color: #808000\">}\"</span>) <span style=\"color: #0000ff; text-decoration-color: #0000ff\">from</span> <span style=\"color: #00ffff; text-decoration-color: #00ffff; text-decoration: underline\">e</span>                <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">│</span>   <span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">47 </span>                                                                                            <span style=\"color: #800000; text-decoration-color: #800000\">│</span>\n",
       "<span style=\"color: #800000; text-decoration-color: #800000\">╰──────────────────────────────────────────────────────────────────────────────────────────────────╯</span>\n",
       "<span style=\"color: #ff0000; text-decoration-color: #ff0000; font-weight: bold\">RuntimeError: </span>Error loading ..<span style=\"color: #800080; text-decoration-color: #800080\">/../../../../tests/integration_tests/examples/</span><span style=\"color: #ff00ff; text-decoration-color: #ff00ff\">example-non-utf8.txt</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[31m╭─\u001b[0m\u001b[31m──────────────────────────────\u001b[0m\u001b[31m \u001b[0m\u001b[1;31mTraceback \u001b[0m\u001b[1;2;31m(most recent call last)\u001b[0m\u001b[31m \u001b[0m\u001b[31m───────────────────────────────\u001b[0m\u001b[31m─╮\u001b[0m\n",
       "\u001b[31m│\u001b[0m \u001b[2;33m/data/source/langchain/langchain/document_loaders/\u001b[0m\u001b[1;33mtext.py\u001b[0m:\u001b[94m29\u001b[0m in \u001b[92mload\u001b[0m                             \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m   \u001b[2m26 \u001b[0m\u001b[2m│   │   \u001b[0mtext = \u001b[33m\"\u001b[0m\u001b[33m\"\u001b[0m                                                                           \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m   \u001b[2m27 \u001b[0m\u001b[2m│   │   \u001b[0m\u001b[94mwith\u001b[0m \u001b[96mopen\u001b[0m(\u001b[96mself\u001b[0m.file_path, encoding=\u001b[96mself\u001b[0m.encoding) \u001b[94mas\u001b[0m f:                             \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m   \u001b[2m28 \u001b[0m\u001b[2m│   │   │   \u001b[0m\u001b[94mtry\u001b[0m:                                                                            \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m \u001b[31m❱ \u001b[0m29 \u001b[2m│   │   │   │   \u001b[0mtext = f.read()                                                             \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m   \u001b[2m30 \u001b[0m\u001b[2m│   │   │   \u001b[0m\u001b[94mexcept\u001b[0m \u001b[96mUnicodeDecodeError\u001b[0m \u001b[94mas\u001b[0m e:                                                 \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m   \u001b[2m31 \u001b[0m\u001b[2m│   │   │   │   \u001b[0m\u001b[94mif\u001b[0m \u001b[96mself\u001b[0m.autodetect_encoding:                                                \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m   \u001b[2m32 \u001b[0m\u001b[2m│   │   │   │   │   \u001b[0mdetected_encodings = \u001b[96mself\u001b[0m.detect_file_encodings()                       \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m \u001b[2;33m/home/spike/.pyenv/versions/3.9.11/lib/python3.9/\u001b[0m\u001b[1;33mcodecs.py\u001b[0m:\u001b[94m322\u001b[0m in \u001b[92mdecode\u001b[0m                         \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m   \u001b[2m 319 \u001b[0m\u001b[2m│   \u001b[0m\u001b[94mdef\u001b[0m \u001b[92mdecode\u001b[0m(\u001b[96mself\u001b[0m, \u001b[96minput\u001b[0m, final=\u001b[94mFalse\u001b[0m):                                                 \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m   \u001b[2m 320 \u001b[0m\u001b[2m│   │   \u001b[0m\u001b[2m# decode input (taking the buffer into account)\u001b[0m                                   \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m   \u001b[2m 321 \u001b[0m\u001b[2m│   │   \u001b[0mdata = \u001b[96mself\u001b[0m.buffer + \u001b[96minput\u001b[0m                                                        \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m \u001b[31m❱ \u001b[0m 322 \u001b[2m│   │   \u001b[0m(result, consumed) = \u001b[96mself\u001b[0m._buffer_decode(data, \u001b[96mself\u001b[0m.errors, final)                \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m   \u001b[2m 323 \u001b[0m\u001b[2m│   │   \u001b[0m\u001b[2m# keep undecoded input until the next call\u001b[0m                                        \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m   \u001b[2m 324 \u001b[0m\u001b[2m│   │   \u001b[0m\u001b[96mself\u001b[0m.buffer = data[consumed:]                                                     \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m   \u001b[2m 325 \u001b[0m\u001b[2m│   │   \u001b[0m\u001b[94mreturn\u001b[0m result                                                                     \u001b[31m│\u001b[0m\n",
       "\u001b[31m╰──────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n",
       "\u001b[1;91mUnicodeDecodeError: \u001b[0m\u001b[32m'utf-8'\u001b[0m codec can't decode byte \u001b[1;36m0xca\u001b[0m in position \u001b[1;36m0\u001b[0m: invalid continuation byte\n",
       "\n",
       "\u001b[3mThe above exception was the direct cause of the following exception:\u001b[0m\n",
       "\n",
       "\u001b[31m╭─\u001b[0m\u001b[31m──────────────────────────────\u001b[0m\u001b[31m \u001b[0m\u001b[1;31mTraceback \u001b[0m\u001b[1;2;31m(most recent call last)\u001b[0m\u001b[31m \u001b[0m\u001b[31m───────────────────────────────\u001b[0m\u001b[31m─╮\u001b[0m\n",
       "\u001b[31m│\u001b[0m in \u001b[92m<module>\u001b[0m:\u001b[94m1\u001b[0m                                                                                    \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m \u001b[31m❱ \u001b[0m1 loader.load()                                                                                \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m   \u001b[2m2 \u001b[0m                                                                                             \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m \u001b[2;33m/data/source/langchain/langchain/document_loaders/\u001b[0m\u001b[1;33mdirectory.py\u001b[0m:\u001b[94m84\u001b[0m in \u001b[92mload\u001b[0m                        \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m   \u001b[2m81 \u001b[0m\u001b[2m│   │   │   │   │   │   \u001b[0m\u001b[94mif\u001b[0m \u001b[96mself\u001b[0m.silent_errors:                                              \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m   \u001b[2m82 \u001b[0m\u001b[2m│   │   │   │   │   │   │   \u001b[0mlogger.warning(e)                                               \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m   \u001b[2m83 \u001b[0m\u001b[2m│   │   │   │   │   │   \u001b[0m\u001b[94melse\u001b[0m:                                                               \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m \u001b[31m❱ \u001b[0m84 \u001b[2m│   │   │   │   │   │   │   \u001b[0m\u001b[94mraise\u001b[0m e                                                         \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m   \u001b[2m85 \u001b[0m\u001b[2m│   │   │   │   │   \u001b[0m\u001b[94mfinally\u001b[0m:                                                                \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m   \u001b[2m86 \u001b[0m\u001b[2m│   │   │   │   │   │   \u001b[0m\u001b[94mif\u001b[0m pbar:                                                            \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m   \u001b[2m87 \u001b[0m\u001b[2m│   │   │   │   │   │   │   \u001b[0mpbar.update(\u001b[94m1\u001b[0m)                                                  \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m \u001b[2;33m/data/source/langchain/langchain/document_loaders/\u001b[0m\u001b[1;33mdirectory.py\u001b[0m:\u001b[94m78\u001b[0m in \u001b[92mload\u001b[0m                        \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m   \u001b[2m75 \u001b[0m\u001b[2m│   │   │   \u001b[0m\u001b[94mif\u001b[0m i.is_file():                                                                 \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m   \u001b[2m76 \u001b[0m\u001b[2m│   │   │   │   \u001b[0m\u001b[94mif\u001b[0m _is_visible(i.relative_to(p)) \u001b[95mor\u001b[0m \u001b[96mself\u001b[0m.load_hidden:                       \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m   \u001b[2m77 \u001b[0m\u001b[2m│   │   │   │   │   \u001b[0m\u001b[94mtry\u001b[0m:                                                                    \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m \u001b[31m❱ \u001b[0m78 \u001b[2m│   │   │   │   │   │   \u001b[0msub_docs = \u001b[96mself\u001b[0m.loader_cls(\u001b[96mstr\u001b[0m(i), **\u001b[96mself\u001b[0m.loader_kwargs).load()     \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m   \u001b[2m79 \u001b[0m\u001b[2m│   │   │   │   │   │   \u001b[0mdocs.extend(sub_docs)                                               \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m   \u001b[2m80 \u001b[0m\u001b[2m│   │   │   │   │   \u001b[0m\u001b[94mexcept\u001b[0m \u001b[96mException\u001b[0m \u001b[94mas\u001b[0m e:                                                  \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m   \u001b[2m81 \u001b[0m\u001b[2m│   │   │   │   │   │   \u001b[0m\u001b[94mif\u001b[0m \u001b[96mself\u001b[0m.silent_errors:                                              \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m \u001b[2;33m/data/source/langchain/langchain/document_loaders/\u001b[0m\u001b[1;33mtext.py\u001b[0m:\u001b[94m44\u001b[0m in \u001b[92mload\u001b[0m                             \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m                                                                                                  \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m   \u001b[2m41 \u001b[0m\u001b[2m│   │   │   │   │   │   \u001b[0m\u001b[94mexcept\u001b[0m \u001b[96mUnicodeDecodeError\u001b[0m:                                          \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m   \u001b[2m42 \u001b[0m\u001b[2m│   │   │   │   │   │   │   \u001b[0m\u001b[94mcontinue\u001b[0m                                                        \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m   \u001b[2m43 \u001b[0m\u001b[2m│   │   │   │   \u001b[0m\u001b[94melse\u001b[0m:                                                                       \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m \u001b[31m❱ \u001b[0m44 \u001b[2m│   │   │   │   │   \u001b[0m\u001b[94mraise\u001b[0m \u001b[96mRuntimeError\u001b[0m(\u001b[33mf\u001b[0m\u001b[33m\"\u001b[0m\u001b[33mError loading \u001b[0m\u001b[33m{\u001b[0m\u001b[96mself\u001b[0m.file_path\u001b[33m}\u001b[0m\u001b[33m\"\u001b[0m) \u001b[94mfrom\u001b[0m \u001b[4;96me\u001b[0m            \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m   \u001b[2m45 \u001b[0m\u001b[2m│   │   │   \u001b[0m\u001b[94mexcept\u001b[0m \u001b[96mException\u001b[0m \u001b[94mas\u001b[0m e:                                                          \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m   \u001b[2m46 \u001b[0m\u001b[2m│   │   │   │   \u001b[0m\u001b[94mraise\u001b[0m \u001b[96mRuntimeError\u001b[0m(\u001b[33mf\u001b[0m\u001b[33m\"\u001b[0m\u001b[33mError loading \u001b[0m\u001b[33m{\u001b[0m\u001b[96mself\u001b[0m.file_path\u001b[33m}\u001b[0m\u001b[33m\"\u001b[0m) \u001b[94mfrom\u001b[0m \u001b[4;96me\u001b[0m                \u001b[31m│\u001b[0m\n",
       "\u001b[31m│\u001b[0m   \u001b[2m47 \u001b[0m                                                                                            \u001b[31m│\u001b[0m\n",
       "\u001b[31m╰──────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n",
       "\u001b[1;91mRuntimeError: \u001b[0mError loading ..\u001b[35m/../../../../tests/integration_tests/examples/\u001b[0m\u001b[95mexample-non-utf8.txt\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "loader.load()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "da554f9a",
   "metadata": {},
   "source": [
    "The file `example-non-utf8.txt` uses a different encoding the `load()` function fails with a helpful message indicating which file failed decoding. \n",
    "\n",
    "With the default behavior of `TextLoader` any failure to load any of the documents will fail the whole loading process and no documents are loaded. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eb844b7e",
   "metadata": {},
   "source": [
    "### B. Silent fail\n",
    "\n",
    "We can pass the parameter `silent_errors` to the `DirectoryLoader` to skip the files which could not be loaded and continue the load process."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "5314ec39",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Error loading ../../../../../tests/integration_tests/examples/example-non-utf8.txt\n"
     ]
    }
   ],
   "source": [
    "loader = DirectoryLoader(path, glob=\"**/*.txt\", loader_cls=TextLoader, silent_errors=True)\n",
    "docs = loader.load()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "id": "216337e5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['../../../../../tests/integration_tests/examples/whatsapp_chat.txt',\n",
       " '../../../../../tests/integration_tests/examples/example-utf8.txt']"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "doc_sources = [doc.metadata['source']  for doc in docs]\n",
    "doc_sources"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4cba0e53",
   "metadata": {},
   "source": [
    "### C. Auto detect encodings\n",
    "\n",
    "We can also ask `TextLoader` to auto detect the file encoding before failing, by passing the `autodetect_encoding` to the loader class."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "id": "96d527d2",
   "metadata": {},
   "outputs": [],
   "source": [
    "text_loader_kwargs={'autodetect_encoding': True}\n",
    "loader = DirectoryLoader(path, glob=\"**/*.txt\", loader_cls=TextLoader, loader_kwargs=text_loader_kwargs)\n",
    "docs = loader.load()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "id": "f1a136a5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['../../../../../tests/integration_tests/examples/example-non-utf8.txt',\n",
       " '../../../../../tests/integration_tests/examples/whatsapp_chat.txt',\n",
       " '../../../../../tests/integration_tests/examples/example-utf8.txt']"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "doc_sources = [doc.metadata['source']  for doc in docs]\n",
    "doc_sources"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
